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Abstract 

We  study  a  new  problem,  the  wakeup  problem ,  that 
seems  to  be  very  fundamental  in  distributed  comput¬ 
ing.  We  present  efficient  solutions  to  the  problem  and 
show  how  these  solutions  can  be  used  to  solve  the  con¬ 
sensus  problem,  the  leader  election  problem,  and  other 
related  problems.  The  main  question  we  try  to  answer 
is,  how  much  memory  is  needed  to  solve  the  wakeup 
problem?  We  assume  a  model  that  captures  important 
properties  of  real  systems  that  have  been  largely  ignored 
by  previous  work  on  cooperative  problems. 

1  Introduction 

1.1  The  Wakeup  Problem 

The  wakeup  problem  is  a  deceptively  simple  new  prob¬ 
lem  that  seems  to  be  very  fundamental  to  distributed 
computing.  The  goal  is  to  design  a  (-resilient  proto¬ 
col  for  n  asynchronous  processes  in  a  shared  memory 
environment  such  that  at  least  p  processes  eventually 
learn  that  at  least  r  processes  have  waked  up  and  be¬ 
gun  participating  in  the  protocol.  Put  another  way,  the 
wakeup  problem  with  parameters  n,t,  r  and  p  is  to  find 
a  protocol  such  that  in  any  fair  run  of  n  processes  with 
at  most  t  failures,  at  least  p  processes  eventually  ibnou; 
that  at  least  r  processes  have  taken  at  least  one  step  in 
the  past.  The  only  kind  of  failures  we  consider  are  crash 
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failures,  in  which  a  process  may  become  faulty  at  any 
time  during  its  execution,  and  when  it  fails,  it  simply 
stops  participating  in  the  protocol. 

In  the  wakeup  problem,  it  is  known  a  priori  by  all  pro¬ 
cesses  that  at  least  n  —  t  processes  will  eventually  wake 
up.  The  goal  is  simply  to  have  a  point  in  time  at  which 
the  fact  that  at  least  r  processes  have  already  waked 
up  is  known  to  p  processes.  It  is  not  required  that  this 
time  be  the  earliest  possible,  and  faulty  processes  are 
included  in  the  counts  of  processes  that  have  waked  up 
and  that  know  about  that  fact.  Note  that  in  a  solution 
to  the  wakeup  problem,  at  least  p  —  t  correct  processes 
eventually  learn  that  at  least  r  —  t  correct  processes  are 
awake  and  participating  in  the  protocol. 

The  significance  of  this  problem  is  two-fold.  First,  it 
seems  generally  useful  to  have  a  protocol  such  that  after 
a  crash  of  the  network  or  after  a  malicious  attack,  the 
remaining  correct  processes  can  figure  out  if  sufficiently 
many  other  processes  remain  active  to  carry  out  a  given 
task.  Second,  a  solution  to  this  problem  is  a  useful 
building  block  for  solving  other  important  problems  (cf. 
section  6). 

1.2  A  New  Model 

Much  work  to  date  on  fault-tolerant  parallel  and  dis¬ 
tributed  systems  has  been  generous  of  the  class  of  faults 
considered  but  rather  strict  in  the  requirements  on  the 
system  itself.  Problems  are  usually  studied  in  an  un¬ 
derlying  model  that  is  fully  synchronous,  provides  each 
process  with  a  unique  name  that  is  known  to  all  other 
processes,  and  is  initialized  to  a  known  state  at  time 
zero.  We  argue  that  none  of  these  assumptions  is  real¬ 
istic  in  today’s  computer  networks,  and  achieving  them 
even  within  a  single  parallel  computer  is  becoming  in¬ 
creasingly  difficult  and  costly.  Large  systems  do  not  run 
off  of  a  single  clock  and  hence  are  not  synchronous.  Pro¬ 
viding  processes  with  unique  id’s  is  costly  and  difficult 
and  greatly  complicates  reconfiguring  the  system.  Fi¬ 
nally,  simultaneously  resetting  all  of  the  computers  and 
communication  channels  in  a  large  network  to  a  known 
initial  state  is  virtually  impossible  and  would  rarely  be 
done  even  if  it  were  possible  because  of  the  large  de¬ 
structive  effects  it  would  have  on  ongoing  activities. 

Our  new  model  of  computation  makes  none  of  these 
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assumptions.  It  consists  of  a  fully  asynchronous  collec¬ 
tion  of  n  identical  anonymous  deterministic  processes 
that  communicate  via  a  single  finite  sized  shared  reg¬ 
ister  which  is  initially  in  an  arbitrary  unknown  state. 
Access  to  the  shared  register  is  via  atomic  “test-and- 
set”  instructions  which,  in  a  single  indivisible  step,  read 
the  value  in  the  register  and  then  write  a  new  value  that 
can  depend  on  the  value  just  read. 

Assuming  an  arbitrary  unknown  initial  state  relates 
to  the  notion  of  self-stablizing  systems  defined  by  Di- 
jkstra  [8].  However,  Dijkstra  considers  only  non¬ 
terminating  control  problems  such  as  the  mutual  ex¬ 
clusion  problem,  whereas  we  show  how  to  solve  decision 
problems  such  as  the  wakeup,  consensus  and  leader  elec¬ 
tion  problems,  in  which  a  process  makes  an  irrevocable 
decision  after  a  finite  number  of  steps. 

Before  proceeding,  we  should  address  two  possible 
criticisms  of  shared  memory  models  in  general  and  our 
model  in  particular.  First,  most  computers  implement 
only  reads  and  writes  to  memory,  so  why  do  we  consider 
atomic  test-and-set  instructions?  One  answer  is  that 
large  parallel  systems  access  shared  memory  through 
a  communication  network  which  may  well  possess  in¬ 
dependent  processing  power  that  enables  it  to  imple¬ 
ment  more  powerful  primitives  than  just  simple  reads 
and  writes.  Indeed,  such  machines  have  been  seriously 
proposed  [23,  44],  Another  answer  is  that  part  of  our 
interest  is  in  exploring  the  boundary  between  what  can 
and  cannot  be  done,  and  a  proof  of  impossibility  for 
a  machine  with  test-and-set  access  to  memory  shows  a 
fortiori  the  corresponding  impossibility  for  the  weaker 
read/ write  model. 

A  second  possible  criticism  is  that  real  distributed 
systems  are  built  around  the  message-passing  paradigm 
and  that  shared  memory  models  are  unrealistic  for  large 
systems.  Again  we  have  several  possible  answers.  First, 
the  premise  may  not  be  correct.  Experience  is  showing 
that  message-passing  systems  are  difficult  to  program, 
so  increasing  attention  is  being  paid  to  implementing 
shared  memory  models,  either  in  hardware  (e.g.  the  Flu¬ 
ent  machine  [45])  or  in  software  (e.g.  the  Linda  system 
[5]).  Second,  message-passing  systems  are  themselves 
an  abstraction  that  may  not  accurately  reflect  the  reali¬ 
ties  of  the  underlying  hardware.  For  example,  message¬ 
passing  systems  typically  assume  infinite  buffers  for  in¬ 
coming  messages,  yet  nothing  is  infinite  in  a  real  system, 
and  indeed  overflow  of  the  message  buffer  is  one  kind  of 
fault  to  which  real  systems  are  subject.  It  is  difficult  to 
see  how  to  study  a  kind  of  fault  which  is  assumed  away 
by  the  model.  Finally,  at  the  lowest  level,  communi¬ 
cation  hardware  looks  very  much  like  shared  memory. 
For  example,  a  wire  from  one  process  to  another  can 
be  thought  of  as  a  binary  shared  register  which  the  first 
process  can  write  (by  injecting  a  voltage)  and  the  second 
process  can  read  (by  sensing  the  voltage). 


1.3  Space  Complexity  Results 

The  main  question  we  try  to  answer  is,  how  many  values 
it  for  the  shared  register  are  necessary  and  sufficient  to 
solve  the  wakeup  problem?  The  answer  both  gives  a 
measure  of  the  communication-space  complexity  of  the 
problem  and  also  provides  a  way  of  assessing  the  cost 
of  achieving  reliability.  We  give  a  brief  overview  of  our 
results  below. 

1.3.1  Fault-Free  Solutions 

First  we  examine  what  can  be  done  in  the  absence  of 
faults  (i.e.,  t  =  0).  We  present  a  solution  to  the  wakeup 
problem  in  which  one  process  learns  that  all  other  pro¬ 
cesses  are  awake  (i.e.,  p  =  1  and  r  =  n),  and  it  uses  a 
single  4-valued  register  (i.e.,  v  =  4).  The  protocol  for 
achieving  this  is  quite  subtle  and  surprising.  It  can  also 
be  modified  to  solve  the  leader  election  problem.  Based 
on  this  protocol,  we  construct  a  fault-free  protocol  that 
reaches  consensus  on  one  out  of  t  possible  values  using 
a  5-valued  register.  Finally,  we  show  that  there  is  no 
fault-free  solution  to  the  wakeup  problem  with  only  two 
values  (i.e.,  one  bit)  when  r  >  3. 

1.3.2  Fault-Tolerant  Solutions:  Upper  Bounds 

We  start  by  showing  that  the  fault-free  solution  which 
uses  a  single  4-valued  register,  mentioned  in  the  pre¬ 
vious  section,  can  actually  tolerate  l  failures  for  any 
r  <  ((2n-2)/(2t  +  1)4-  l)/2.  Using  many  copies  of  this 
protocol,  we  construct  a  protocol  with  v  =  8,+1  that 
tolerates  t  faults  when  r  <  n  —  t.  Thus,  if  t  is  a  con¬ 
stant,  then  a  constant  sized  shared  memory  is  sufficient, 
independent  of  n.  However,  the  constant  grows  expo¬ 
nentially  with  i.  An  easy  protocol  exists  with  v  =  n  that 
works  for  any  t  and  r  <  n—t.  This  means  that  the  above 
exponential  result  is  only  of  interest  for  t  logn.  Fi¬ 
nally,  we  show  that  for  any  t  <  n/2,  there  is  a  <-resilient 
solution  to  the  wakeup  problem  for  any  r  <  |n/2J  +  1, 
using  a  single  0(<)-valued  register. 

1.3.3  Fault-Tolerant  Solutions:  A  Lower  Bound 

We  prove  that  for  any  protocol  P  that  solves  the  wakeup 
problem  for  parameters  n,  t  and  r,  the  number  of  shared 
memory  values  used  by  P  is  at  least  IV° ,  where  W  = 
{l\ft  -  t)/(n  —  t)  and  a  =  l/(\og2(jf%  +  3)).  The  proof 
is  quite  intricate  and  involves  showing  for  any  protocol 
with  too  few  memory  values  that  there  is  a  run  in  which 
n  —  t  processes  wake  up  and  do  not  fail,  yet  no  process 
can  distinguish  that  run  from  another  in  which  fewer 
than  r  wake  up;  hence,  no  process  knows  that  r  are 
awake. 
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1.4  Relation  to  Other  Problems 

We  establish  connections  between  the  wakeup  prob¬ 
lem  and  two  fundamental  problems  in  distributed  com¬ 
puting:  the  consensus  problem  and  the  leader  elec¬ 
tion  problem.  These  two  problems  lie  at  the  core  of 
many  problems  for  fault-tolerant  distributed  applica¬ 
tions  [1,  7,  10,  13,  16,  20,  21,  32,  34,  43,  42,  48], 

We  show  that:  (1)  any  protocol  that  uses  v  values 
and  solves  the  wakeup  problem  for  t  <  n/2,  r  >  n/2 
and  p  =  1  can  be  transformed  into  t-resilient  consensus 
and  leader  election  protocols  which  use  8v  values;  and 
(2)  any  t-resilient  consensus  and  leader  election  protocol 
that  uses  v  values  can  be  transformed  into  a  t-resilient 
protocol  which  uses  4v  values  and  solves  the  wakeup 
problem  for  any  r  <  [n/ 2j  +  1  and  p  =  1. 

Using  the  first  result  above,  we  can  construct  effi¬ 
cient  solutions  to  both  the  consensus  and  leader  election 
problems  from  solutions  for  the  wakeup  problem.  The 
second  result  implies  that  the  lower  bound  proved  for 
the  wakeup  problem  holds  for  these  other  two  problems. 
As  a  consequence,  the  consensus  and  the  leader  election 
problems  are  space-equivalent  in  our  model.  This  is 
particularly  surprising  since  the  two  problems  seem  so 
different.  The  difficulty  in  leader  election  is  breaking 
symmetry,  whereas  consensus  is  inherently  symmetric. 

2  Definitions  and  Notations 

2.1  Protocols  and  Knowledge 

An  n-process  protocol  P  =  (C,  N,  R)  consists  of  a 
nonempty  set  C  of  runs,  an  n-tuple  N  =  (91,..., qn) 
of  process  id’s  (or  process,  for  short),  and  an  n-tuple 
R  =  (Ri, . . . ,  R„)  of  sets  of  registers.  Informally,  Ri 
includes  all  the  register  that  process  9,  can  access.  We 
assume  throughout  this  paper  that  n  >  2. 

A  run  is  a  pair  of  the  form  (/,  S)  where  /  is  a 
function  which  assigns  initial  values  to  the  registers  in 
R\  U  . . .  U  Rn  and  5  is  a  finite  or  infinite  sequence  of 
events.  (When  S  is  finite,  we  also  say  that  the  run  is 
finite.)  An  event  e  =  (9, ,  v,r,  v1)  means  that  process  9,, 
in  one  atomic  step,  first  reads  a  value  v  from  register 
r  and  then  writes  a  value  v'  into  register  r.  We  say 
that  the  event  e  involves  process  9,  and  that  process  qi 
performs  a  test-and-set  operation  on  register  r. 

The  set  of  runs  is  assumed  to  satisfy  several  proper¬ 
ties;  for  example,  it  should  be  prefix  closed.  Because  of 
lack  of  space,  we  do  not  give  a  complete  list  here  but 
point  out  that  these  properties  capture  the  fact  that 
we  are  assuming  that  the  processes  are  anonymous  and 
identically  programmed,  are  not  synchronized,  and  that 
nothing  can  be  assumed  about  the  initial  state  of  the 
shared  memory. 

The  value  of  a  register  at  a  finite  run  is  the  last  value 
that  was  written  into  that  register,  or  its  initial  value  if 


no  process  wrote  into  the  register.  A  register  r  is  said 
to  be  local  if  there  exists  an  «  such  that  r  €  Ri  and 
for  any  j  /  1,  r  £  Rj.  A  register  is  shared  if  it  is  not 
local.  In  this  paper  we  restrict  attention  to  protocols 
which  have  exactly  one  register  which  is  shared  by  all 
the  processes  (i.e.,  |/?i  n  . . .  O  Rn \  =  1)  and  all  other 
registers  are  local.  If  S'  is  a  prefix  of  S  then  the  run 
(/,  S')  is  a  prefix  of  (/,  S),  and  (/,  S)  is  an  extension  of 
(/,  S').  For  any  sequence  5,  let  S,  be  the  subsequence 
of  S  containing  all  events  in  S  which  involve  9,. 

Definition:  Computations  (/,  5)  and  (/',  S’)  are  equiv¬ 
alent  with  respect  to  9,  ,  denoted  by  (/,  S)  ~  (/',S'),  iff 

Si  =  s;. 

We  are  now  ready  to  define  the  notion  of  knowledge  in 
a  shared  memory  environment.  In  the  following,  we  use 
predicate  to  mean  a  set  of  runs. 

Definition:  For  a  process  9;,  predicate  6  and  finite  run 
p,  process  qi  knows  b  at  p  iff  for  all  p'  such  that  p  ~  p' , 
it  is  the  case  that  p'  6  6. 

For  simplicity,  we  assume  that  a  process  always  takes 
a  step  whenever  it  is  scheduled.  A  process  that  takes 
infinitely  many  steps  in  a  run  is  said  to  be  correct  in 
that  run;  otherwise  it  is  faulty.  We  say  that  an  infinite 
run  is  l-fair  iff  at  least  l  processes  are  correct  in  it. 

2.2  Wakeup,  Consensus  and  Leader 
Election  Protocols 

In  this  subsection  we  formally  define  the  notions  of  t- 
resilient  wakeup,  consensus  and  leader  election  protocols 
(0  <  t  <  n).  We  say  that  a  process  qi  is  awake  in  a 
run  if  the  run  contains  an  event  that  involves  qi.  The 
predicate  “at  least  r  processes  are  awake  in  run  p"  is  the 
set  of  all  runs  for  which  there  exist  r  different  processes 
which  are  awake  in  the  run.  Note  that  a  process  that 
fails  after  taking  a  step  is  never! hHess  considered  to  be 
awake  in  the  run. 

•  A  wakeup  protocol  with  parameters  n,t,r  and  p  is 
a  protocol  for  n  processes  such  that,  for  any  (n  —  <)- 
fair  run  p,  there  exists  a  finite  prefix  of  p  in  which 
at  least  p  processes  know  that  at  least  r  processes 
are  awake  in  p. 

It  is  easy  to  see  that  a  wakeup  protocol  exists  only 
if  max(p,  r)  <  n  —  t,  and  hence,  from  now  on,  we 
assume  that  this  is  always  the  case.  We  also  assume 
that  min(p,  r)  >  1. 

In  the  following,  whenever  we  speak  about  a  solu¬ 
tion  to  the  wakeup  problem  without  mentioning  p, 
we  are  assuming  that  p  =  I. 

•  A  i-resilient  k-consensus  protocol  is  a  protocol  for 
n  processes,  where  each  process  has  a  local  read¬ 
only  input  register  and  a  local  write-once  output 
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register.  For  any  (n  —  <)-fair  run  there  exists  a  fi¬ 
nite  prefix  in  which  all  the  correct  processes  decide 
on  some  value  from  a  set  of  size  k  (i.e.,  each  cor¬ 
rect  process  writes  a  decision  value  into  its  local 
output  register),  the  decision  values  written  by  all 
processes  are  the  same,  and  the  decision  value  is 
equal  to  the  input  value  of  some  process. 

In  the  following,  whenever  we  say  “consensus” 
(without  mentioning  specific  lb)  we  mean  “binary 
consensus”,  where  the  possible  decision  values  are 
0  and  1. 

•  A  i-resilient  leader  election  protocol  is  a  proto¬ 
col  for  n  processes,  where  each  process  has  a  local 
write-once  output  register.  For  any  (n  —  <)-fair  run 
there  exists  a  finite  prefix  in  which  all  the  correct 
processes  decide  on  some  value  in  {0,1},  and  ex¬ 
actly  one  (correct  or  faulty)  process  decides  on  1. 
That  process  is  called  the  leader. 

3  Fault-free  solutions 

In  this  section,  we  develop  the  See-Saw  protocol,  which 
solves  the  fault-free  wakeup  problem  using  a  single  4- 
valued  shared  register.  Then  we  show  how  the  See-Saw 
protocol  can  be  used  to  solve  the  /fc-valued  consensus 
problem.  Finally,  we  claim  that  it  is  impossible  to  solve 
the  wakeup  problem  using  only  one  shared  bit. 

To  understand  the  See-Saw  protocol,  the  reader 
should  imagine  a  playground  with  a  See-Saw  in  it.  The 
processes  will  play  the  protocol  on  the  See-Saw,  adher- 
irg  to  strict  rules.  When  each  process  enters  the  play¬ 
ground  (wakes  up),  it  sits  on  the  up-side  of  the  See-Saw 
causing  it  to  swing  to  the  ground.  Only  a  process  on  the 
ground  (or  down-side)  can  get  off  and  when  it  does  the 
See-Saw  must  swing  to  the  opposite  orientation.  These 
rules  enforce  a  balance  invariant  which  says  that  the 
number  of  processes  on  each  side  of  the  See-Saw  differs 
by  at  most  one  (the  heavier  side  always  being  down). 

Each  process  enters  the  playground  with  two  tokens. 
The  protocol  will  force  the  processes  on  the  bottom  of 
the  See-Saw  to  give  away  tokens  to  the  processes  on 
the  top  of  the  See-Saw.  Thus,  token  flow  will  change 
direction  depending  on  the  orientation  of  the  See-Saw. 
Tokens  can  be  neither  created  nor  destroyed.  The  idea 
of  the  protocol  is  to  cause  tokens  to  concentrate  in  the 
hands  of  a  single  process.  A  process  seeing  2 k  tokens 
knows  that  at  least  k  processes  are  awake.  Hence,  if  it 
is  guaranteed  that  eventually  some  process  will  see  at 
least  2 r  tokens,  the  protocol  is  by  definition  a  wakeup 
protocol  with  parameter  r,  even  if  the  process  does  not 
know  the  value  of  r  and  hence  does  not  know  when  the 
goal  has  been  achieved. 

Following  is  the  complete  description  of  the  See-Saw 
protocol.  The  4-valued  shared  register  is  easily  inter¬ 


preted  as  two  bits  which  we  call  the  “token  bit”  and  the 
“See-Saw”  bit.  The  two  states  of  the  token  bit  are  called 
“token  present”  and  “no  token  present”.  We  think  of 
a  public  token  slot  which  either  contains  a  token  or  is 
empty,  according  to  the  value  of  the  token  bit.  The  two 
states  of  the  See-Saw  bit  are  called  “left  side  down”  and 
“right  side  down”.  The  “See-Saw”  bit  describes  a  vir¬ 
tual  See-Saw  which  has  a  left  and  a  right  side.  The  bit 
indicates  which  side  is  down  (implying  that  the  opposite 
side  is  up). 

Each  process  remembers  in  private  memory  the  num¬ 
ber  of  tokens  it  currently  possess  and  which  of  four 
states  it  is  currently  in  with  respect  to  the  See-Saw: 
“never  been  on”  “on  left  side”,  “on  right  side”,  and  “got 
off’ .  A  process  is  said  to  be  on  the  up-side  of  the  See- 
Saw  if  it  is  currently  “on  left  side”  and  the  See-Saw  bit 
is  in  state  “right  side  down”,  or  it  is  currently  “on  right 
side”  and  the  See-Saw  bit  is  in  state  “left  side  down”. 
A  process  initially  possesses  two  tokens  and  is  in  state 
“never  been  on”. 

We  define  the  protocol  by  a  list  of  rules.  When  a  pro¬ 
cess  is  scheduled,  it  looks  at  the  shared  register  and  at 
its  own  internal  state  and  carries  out  the  first  applica¬ 
ble  rule,  if  any.  If  no  rule  is  applicable,  it  takes  a  null 
step  which  leaves  its  internal  state  and  the  value  in  the 
shared  register  unchanged. 

Rule  1:  (Start  of  protocol)  Applicable  if  the  scheduled 
process  is  in  state  “never  been  on”.  The  process 
gets  on  the  up-side  of  the  See-Saw  and  then  flips  the 
See-Saw  bit.  By  “get  on”,  we  mean  that  the  process 
changes  its  state  to  “on  left  side”  or  “on  right  side” 
according  to  whichever  side  is  up.  Since  flipping 
the  See-Saw  bit  causes  that  side  to  go  down,  the 
process  ends  up  on  the  down-side  of  the  See-Saw. 

Rule  2:  ( Emitter )  Applicable  if  the  scheduled  process 
is  on  the  down-side  of  the  See-Saw,  has  one  or 
more  tokens,  and  the  token  slot  is  empty.  The 
process  flips  the  token  bit  (to  indicate  that  a  to¬ 
ken  is  present)  and  decrements  by  one  the  count 
of  tokens  it  possesses.  If  its  token  count  thereby 
becomes  zero,  the  process  flips  the  See-Saw  bit  and 
gets  off  the  See-Saw  by  setting  its  state  to  “got  off’ . 

Rule  3:  ( Absorber )  Applicable  if  the  scheduled  process 
is  on  the  up-side  of  the  See-Saw  and  a  token  is 
present  in  the  token  slot.  The  process  flips  the 
token  bit  (to  indicate  that  a  token  is  no  longer 
present)  and  increments  by  one  the  count  of  tokens 
it  possesses. 

Note  that  if  a  scheduled  process  is  on  the  down-side, 
has  2k  -  1  tokens,  and  a  token  is  present  in  the  token 
slot,  then,  although  no  rule  is  applicable,  the  process 
nevertheless  sees  a  total  of  2ib  tokens  and  hence  knows 
that  k  processes  have  waked  up. 
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The  two  main  ideas  behind  the  protocol  can  be  stated 
as  invariants. 

TOKEN  INVARIANT:  The  number  of  tokens  in  the 
system  is  either  2 n  or  2n  +  1  and  does  not  change  at 
any  time  during  the  protocol.  (The  number  of  tokens  in 
the  system  is  the  total  number  of  tokens  possessed  by 
all  of  the  processes,  plus  1  if  a  token  is  present  in  the 
token  bit  slot.) 

BALANCE  INVARIANT:  The  number  of  processes  on 
the  left  and  right  sides  of  the  See-Saw  is  either  perfectly 
balanced  or  favors  the  down-side  of  the  See-Saw  by  one 
process. 

Theorem  3.1:  Let  t  =  0.  The  See-Saw  protocol  uses 
a  4- valued  shared  register  and  is  a  wakeup  protocol  for 
n,t,  r  (and  p  =  l),  where  n  and  t  are  arbitrary  and  t  = 
0.  (Note  that  the  rules  for  the  protocol  do  not  mention 
n  or  r.) 

In  applications  of  wakeup  protocols,  it  is  often  desir¬ 
able  for  the  processes  to  know  the  value  of  r  so  that 
a  process  learning  that  r  processes  are  awake  can  stop 
participating  in  the  wakeup  protocol  and  take  some  ac¬ 
tion  based  on  that  knowledge.  The  See-Saw  protocol 
can  be  easily  modified  to  have  this  property  by  adding 
a  termination  rule  immediately  after  Rule  1: 

Rule  la:  (End  of  protocol)  Applicable  if  the  scheduled 
process  is  on  the  See-Saw  and  sees  at  least  2r  to¬ 
kens,  where  the  number  of  tokens  the  process  sees 
is  the  number  it  possesses,  plus  one  if  a  token  is 
present  in  the  token  slot.  The  process  thus  knows 
that  r  processes  have  waked  up.  It  gets  off  the  See- 
Saw  (i.e.,  terminates)  by  setting  its  state  to  “got 
off”. 

The  See-Saw  protocol  can  also  be  used  to  solve  the 
leader  election  problem  by  electing  the  first  process  that 
sees  2n  tokens.  By  adding  a  5 th  value,  everyone  can  be 
informed  that  the  leader  was  elected,  and  the  leader 
can  know  that  everyone  knows.  Now,  the  leader  can 
transmit  an  arbitrary  message,  for  example  a  consensus 
value,  to  all  the  other  processes  without  using  any  more 
new  values  through  a  kind  of  serial  protocol.  This  leads 
to  our  next  theorem. 

Theorem  3.2:  In  the  absence  of  faults,  it  is  possible 
to  reach  consensus  on  one  of  k  values  using  a  single 
5-valued  shared  register. 

Finally,  we  claim  that  the  See-Saw  protocol  cannot  be 
improved  to  use  only  a  single  binary  register.  A  slightly 
weaker  result  than  Theorem  3.3  was  also  proved  by  Joe 
Halpern  [27],  The  question  whether  3  values  suffice  is 
still  open. 

Theorem  3.3:  There  does  not  exist  a  solution  to  the 
wakeup  problem  which  uses  only  a  single  binary  register 
when  r  >  3. 


4  Fault-tolerant  solutions 


In  this  section,  we  explore  solutions  to  the  wakeup  prob¬ 
lem  which  can  tolerate  t  >  0  process  failures. 

The  See-Saw  protocol,  presented  in  the  previous  sec¬ 
tion,  cannot  tolerate  even  a  single  crash  failure  for  any 
t  >  n/3.  The  reason  is  that  the  faulty  process  may 
fail  after  accumulating  2n/3  tokens,  trapping  two  other 
processes  on  one  side  of  the  See-Saw,  each  with  2n/3 
tokens.  When  r  <  n/3,  the  See-Saw  protocol  can  toler¬ 
ate  at  least  one  failure.  As  the  parameter  r  decreases, 
the  number  of  failures  that  the  protocol  can  tolerate 
increases.  This  fact  is  captured  in  our  first  theorem. 

Theorem  4.1:  The  See-Saw  protocol  is  a  wakeup  pro¬ 
tocol  for  n,t,  r,  where  r  <  ((2n  —  l)/(2 1  +  1)  -f  l)/2. 

We  note  that  the  See-Saw  protocol  can  tolerate  up  to 
n/2  —  1  initial  failures  [21,  49].  In  the  rest  of  this  sec¬ 
tion,  we  present  three  t-resilient  solutions  to  the  wakeup 
problem.  Notice  that  when  the  number  of  failures  t  is 
a  constant,  it  is  possible  using  a  constant  number  of 
values  for  one  process  to  learn  that  n  —  t  processes  are 
awake. 

Theorem  4.2:  For  any  t  <  n/6,  there  is  a  wakeup  pro¬ 
tocol  which  uses  a  single  8 -valued  register  and  works 
for  any  r  <  n  —  t. 

Theorem  4.3:  For  any  t  <  n,  there  is  a  wakeup  proto¬ 
col  which  uses  a  single  n-valued  register  and  works  for 
any  r  <n  —  t. 

Theorem  4.4:  For  any  t  <  n/2,  there  is  a  wakeup  pro¬ 
tocol  which  uses  a  single  0(t)-valued  register  and  works 
for  any  r  <  [n/2j  +  1. 


5  A  Lower  Bound 


In  this  section,  we  establish  a  lower  bound  on  the  num¬ 
ber  of  shared  memory  values  needed  to  solve  the  wakeup 
problem,  where  only  one  process  is  required  to  learn 
that  r  processes  are  awake,  assuming  t  processes  may 
crash  fail  (thus  p  =  1).  To  simplify  the  presentation, 
we  assume  that  9  <  t  <  2n/3  and  r  >  n/3.  Also,  recall 
that  we  already  assumed  that  r  <  n  —  t.  For  the  rest  of 
this  section,  let 


tVt- 

-t  ’ 


t2- 
4(n-t)’ 


0) 


I°g2(?ff  +  3.5)-  (2) 

Note  that  W  <  U  since  t  >  9. 

Theorem  5.1:  Let  P  be  a  wakeup  protocol  with  param¬ 
eters  n,  t  and  r.  Let  V  be  the  set  of  shared  memory 
values  used  by  P.  Then  |  V|  >  Wa. 
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When  we  take  t  to  be  a  constant  fraction  of  n  we  get 
the  following  immediate  corollary. 

Corollary  5.1:  Lei  P  be  a  wakeup  protocol  wiih  pa¬ 
rameters  n,  t  and  r,  where  t  =  n/c.  Lei  V  be  the 
set  of  shared  memory  values  used  by  P  and  lei  7  = 
1/(2  log2(c+ 2.5)).  Then,  \V\  =  n(n»). 

Theorem  5.1  is  immediate  if  V  is  infinite,  so  we  as¬ 
sume  throughout  this  section  that  V  is  finite.  The  proof 
consists  of  several  parts.  First  we  define  a  sequence  of 
directed  graphs  whose  nodes  are  shared  memory  values 
in  V .  Each  component  C  of  each  graph  in  the  sequence 
has  a  cardinality  kc  and  a  weight  wc.  We  establish  by 
induction  that  kc  >  min(u;c,W)°.  Finally,  we  argue 
that  in  the  last  graph  in  the  sequence,  every  component 
C  has  weight  wc  >  W .  Hence,  1^1  >ke>  Wa. 

5.1  Reachability  Graphs  and  Terminal 
Graphs 

Let  V  be  the  alphabet  of  the  shared  register.  We  say 
that  a  value  a  €  V  appears  m  times  in  a  given  run  if 
there  are  (at  least)  m  different  prefixes  of  that  run  where 
the  value  of  the  shared  register  is  a. 

a  b  denotes  that  there  exists  a  run  in  which  at 
most  u  processes  participate,  the  initial  value  of 
the  shared  register  is  a,  and  the  value  b  appears  at 
least  once. 

a  6  denotes  that  there  exists  a  run  in  which  exactly 
u  processes  participate,  each  process  that  partici¬ 
pates  takes  infinitely  many  steps,  the  initial  value 
of  the  shared  register  is  a,  and  the  value  6  appears 
infinitely  many  times. 

Clearly,  a  ===>  b  implies  a  —*  b  but  not  vice  versa.  Also, 
for  every  a,  there  exists  b  such  that  a  =>  6. 

We  use  the  following  graph-theoretic  notions.  A  di¬ 
rected  multigraph1  G  is  weakly  connected  if  the  under¬ 
lying  undirected  multigraph  of  G  is  connected.  A  multi¬ 
graph  G'{V' ,  E ')  is  a  subgraph  of  G(V,  E)  if  E'  C  E  and 
V  C  V.  A  multigraph  G'  is  a  component  of  a  multi¬ 
graph  G  if  it  is  a  weakly  connected  subgraph  of  G  and 
for  any  edge  (a,  6)  in  G,  either  both  a  and  6  are  nodes 
of  G'  or  both  a  and  6  are  not  in  G' .  A  node  is  a  root 
of  a  multigraph  if  there  is  a  directed  path  from  every 
other  node  in  the  multigraph  to  that  node.  A  rooted 
graph  (rooted  component)  is  a  graph  (component)  with 
at  least  one  root.  A  labeled  multigraph  is  a  multigraph 
together  with  a  label  function  that  assigns  a  weight  in  N 
to  each  edge  of  G.  The  weight  of  a  labeled  multigraph 
is  the  sum  of  the  weights  of  its  edges. 

We  now  define  the  notion  of  a  reachability  graph  of  a 
given  protocol  P. 

1 A  multigraph  can  have  several  edges  from  a  to  6. 


Definition:  Let  V  be  the  set  of  shared  memory  values 
of  protocol  P.  The  reachability  graph  G  of  protocol  P  is 
the  labeled  directed  multigraph  with  node  set  V  which 
has  an  edge  from  node  a  to  node  b  labeled  with  r  iff 
a  ==>  b  holds.  (Note  that  there  may  be  several  edges 
with  different  labels  between  the  same  two  nodes.  Note 
also  that  G  is  finite  since  a  ==►  b  implies  that  r  <  |V|.) 

Definition:  A  graph  C  is  closed  at  node  a  w.r.t.  G  if  a 
is  in  C  and  for  every  node  6  in  G ,  if  (a,  6)  is  an  edge  of 
G  then  b  is  in  C. 

Definition:  A  multigraph  T  is  terminal  w.r.t.  G  if  T  is 
a  subgraph  of  G ,  all  of  T's  components  are  rooted,  and 
T  has  a  component  C  with  root  a  among  its  minimal 
weight  components  that  is  closed  at  node  a  w.r.t.  G. 

In  the  rest  of  the  section  we  show  that  the  reacha¬ 
bility  graph  G  of  any  wakeup  protocol  with  parameters 
n,t,r  has  size  >  W".  We  do  that  by  constructing  a 
multigraph  T  which  is  terminal  w.r.t.  G  and  has  size 
>  W° .  Theorem  5.1  follows  from  these  facts. 

5.2  Reachability  Graphs 

The  reachability  graphs  are  defined  for  all  protocols. 
Now  we  concentrate  on  such  graphs  constructed  from 
wakeup  protocols  only.  We  show  that  when  the  weight 
of  a  rooted  component,  say  C,  is  sufficiently  small,  an 
edge  exists  with  a  label  q  from  a  root  of  C  to  a  node 
not  in  C,  and  we  can  bound  the  size  of  q. 

For  later  reference  we  call  the  following  three  inequal¬ 
ities, 

(i)  pq  +  (p  —  l)w  <  n , 

(ii)  pq>n-t, 

(iii)  max(<7,u;)<r 

the  zigzag  inequalities.  These  inequalities  play  an  im¬ 
portant  role  in  our  exposition. 

Lemma  5.1:  Given  reachability  graph  G  of  a  wakeup 
protocol  P  with  parameters  n,t,  r  and  a  rooted  subgraph 
C  of  G  with  root  a  and  weight  w,  if  there  exist  positive 
integers  p  and  q  that  satisfy  the  zigzag  inequalities,  then 
for  any  node  b  of  G,  if  a  =^>  6  is  an  edge  of  G  then  b  is 
not  in  C. 

Proof:  We  assume  to  the  contrary  that  there  exists  p 
and  q  that  satisfy  the  zigzag  inequalities,  and  there  is 
an  edge  a  =!=>  b  such  that  b  belongs  to  C.  Let  p  be  a 
9-fair  run  starting  from  a  in  which  exactly  q  processes 
participate  and  6  is  written  infinitely  often.  Since  b  L> 
in  C,  there  is  a  path  from  6  to  a  such  that  the  sum  of 
all  the  labels  of  edges  in  that  path  is  at  most  w  and 
hence  b  a.  This  allows  us  to  construct  a  run  with 
pq  non-faulty  processes  starting  with  a  as  follows: 

Run  q  processes  according  to  p  until  b  is  writ¬ 
ten.  Run  w  processes  until  a  is  written.  (This 
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must  be  possible  since  b  a.)  Let  these  w 
processes  fail.  Run  a  second  group  of  q  pro¬ 
cesses  according  to  p  until  b  is  written.  Run 
a  second  group  of  w  processes  until  a  is  writ¬ 
ten,  and  let  them  fail.  Repeat  the  above  until 
the  p,h  group  of  q  processes  have  just  been  run 
and  b  has  again  been  written.  At  this  point, 
pq  processes  belong  to  still-active  groups,  and 
(p—  l)u>  processes  have  died.  If  any  processes 
remain,  let  them  die  now  without  taking  any 
steps.  Now,  an  infinite  run  p'  on  the  active  pro¬ 
cesses  can  be  constructed  by  continuing  to  run 
the  first  group  according  to  p  until  b  is  writ¬ 
ten  again,  then  doing  the  same  for  the  second 
through  pth  groups,  and  repeating  this  cycle 
forever. 


The  result  is  a  pg-fair  run.  Moreover,  no  reliable  process 
can  distinguish  this  run  from  p,  and  hence  no  reliable 
process  ever  knows  (in  p')  that  more  than  q  processes 
are  awake.  Also,  obviously,  no  faulty  process  knows  that 
more  than  w  processes  are  awake.  Since  max(?,  w)  <  r 
but  at  least  pq  >  n  —  t  >  r  processes  are  awake  in  p' , 
this  leads  to  a  contradiction  to  the  assumption  that  P 
is  a  wakeup  protocol.  □ 

Lemma  5.2:  Assume  w  <U .  Then  the  inequality 

x2  -  tx  +  w(n  —  t)  <  0  (3) 


has  a  positive  integer  solution.  The  smallest  positive 
integer  solution  for  (3)  is 


q  = 


1  ~  l\/<2  ~  4w(n  ~  01 
2 


(4) 


There  exists  a  positive  integer  p  such  that  p  and  q  satisfy 
the  zigzag  inequalities. 


Proof:  We  first  show  that  (3)  has  a  positive  solution. 
Using  the  quadratic  formula,  we  get  that  the  roots  of 

(3)  are 

t  -  | \/<2  -  4 w(n  -  ()|  t  +  -  4u;(n  -  t)| 

- - -  and  - - - . 

2  2 

Since  w  <  U  the  discriminant  t2  —  4w(n  —t)  >  1.  Since 
the  value  of  the  discriminant  is  less  than  t2  it  follows 
that  the  roots  are  positive.  Moreover,  the  difference 
of  the  two  roots  is  at  least  1;  hence  there  is  a  positive 
integer  x  satisfying  (3),  and  q  is  the  least  such  integer. 
Moreover,  since  t  is  an  integer,  tf  2  is  either  an  integer  or 
lies  exactly  half  way  between  two  integers,  so  inequality 

(4)  holds. 

Next  we  show  that  there  exists  a  positive  integer  p 
such  that  p  and  q  satisfy  the  inequalities  (i)  and  (**). 
Let  p  =  f(n  —  t)/q]  ■  The  choice  of  p  clearly  satisfies 
(**).  Also  from  (3)  it  follows  that 


P  = 


n  —  t  ^n  —  /  ^n  +  u; 
q  ~  q  ~  q  +  w 


which  implies  (»'). 

Finally,  we  show  that  inequality  (Hi)  is  sa*>sfied.  Re¬ 
call  that  we  assume  that  t  <  2n/3  and  r  >  n/3.  It 
follows  from  these  assumptions  that  r  >  t/2.  Since 
q  <  t/2,  obviously  q  <  r.  Also,  since  w  <  U  and 
t  <  2n/3,  substituting  in  (1)  gives  w  <  n/3,  and  hence 
tv  <  t.  □ 

Lemma  5.3:  If  w  <  W,  then  there  extsts  positive  inte¬ 
gers  p  and  q  that  satisfy  the  zigzag  inequalities  and 


q  < 


w(n  —  t) 
t  +  2 


+  3. 


(5) 


Proof:  Recall  that  W  <  U,  so  in  particular,  w  <  U . 
Let  q'  be  the  smallest  positive  integer  solution  to  (3). 
It  follows  from  Lemma  5.2  that  q'  exists.  Let  q  be  the 
smallest  positive  integer  for  which  there  exists  a  positive 
integer  p  such  that  p  and  q  satisfy  the  zigzag  inequalities. 
It  follows  from  Lemma  5.2  that  q  exists  and  q  <  q' . 

If  q  =  1  than  clearly  1  <  w(n  —  t)/(t  +  2)  +  2  and 
the  lemma  holds.  Assume  q  >  1.  Since  q  <  q'  it  follows 
that 

(q  -  l)2  -  i(q  -  1)  +  w(n  -  <)  >  0. 

Thus, 

<sl±L±l±^!Lill_  (6) 

H  t  -f  2  ’ 

By  Lemma  5.2, 


9  < 


t  -  | y/t2  -  4tr(n  -  <)| 
2 


(7) 


Since  w  <  W,  we  can  substitute  W  for  w  in  inequality 
(7)  and  get  that  q  <  [\/7]  <  s/t  -t-  1.  Thus,  q2  < 
t  +  2 Vt  +  1,  so  it  follows  from  (6)  and  the  assumption 
that  t  >  9  that  inequality  (5)  holds.  □ 


5.3  Terminal  Graphs 

In  this  subsection,  we  show  that  the  reachability  graph 
G  of  any  wakeup  protocol  with  parameters  n ,<,r,  has 
at  least  one  subgraph  which  is  terminal  w.r.t.  G  and 
has  size  >  W° .  We  first  prove  that  the  weight  of  any 
rooted  component  of  any  terminal  graph  w.r.t.  G  has 
weight  >  U.  Then  we  show  that  this  implies  that  there 
exists  a  terminal  graph  w.r.t.  G,  all  of  whose  rooted 
components  have  size  >  Wa . 

Lemma  5.4:  Let  G  be  the  reachability  graph  of  a 
wakeup  protocol  with  parameters  n,t,  r  and  let  T  be  ter¬ 
minal  w.r.t.  G.  Any  rooted  component  of  T  has  weight 
>U. 

Proof:  Assume  to  the  contrary  that  T  has  a  minimal- 
weight  component  C  of  weight  w  <  U .  Then,  by  Lemma 
5.2,  there  exist  positive  integers  q  and  p  that  satisfy  the 
zigzag  inequalities.  From  Lemma  5.1,  there  is  a  node  6 
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not  in  C  and  an  edge  a  6  in  G.  Therefore,  T  is  not 
a  terminal  w.r.t.  G,  a  contradiction.  □ 

Lemma  5.5:  Let  G  be  the  reachability  graph  of  a 
makeup  protocol  with  parameters  n,t,r.  There  exists  a 
graph  T  which  is  terminal  w.r.t.  G,  all  of  whose  rooted 
components  have  size  >  Wa. 

Proof:  The  following  procedure  constructs  T  by  adding 
edges  one  at  a  time  to  an  initial  subgraph  To  of  G  un¬ 
til  step  2  fails.  The  initial  subgraph  To  consists  of  all 
the  nodes  of  G.  For  each  node  a  there  is  exactly  one 
outgoing  edge  a  =>  6  in  To.  We  note  two  facts  about 
To:  (1)  for  every  edge  a  ==>  b,  a  £  b,  and  (2)  every 
component  of  To  has  at  least  one  root.  Fact  (1)  follows 
from  Lemma5.1,  choosing  q  =  1  and  p  =  n  —  t  (to  =  0); 
while  (2)  follows  from  the  fact  that  the  outdegree  of  any 
node  is  exactly  one.  Also,  it  follows  from  (1)  that  the 
weight  and  size  of  any  component  of  To  is  at  least  2. 

At  any  stage  of  the  construction,  every  component 
of  the  graph  built  so  far  will  have  at  least  one  root. 
Added  edges  always  start  at  a  root  and  end  at  a  node 
of  a  different  component.  After  adding  an  edge  (a,  6), 
the  components  of  a  and  6  are  joined  together  into  a 
single  component  whose  root  is  the  root  of  fc’s  compo¬ 
nent,  and  the  weight  of  the  new  component  is  the  sum 
of  the  weights  of  the  two  original  components  plus  the 
label  of  the  edge  from  a  to  6. 

Procedure  for  adding  a  new  edge  to  T: 

Step  1:  Select  an  arbitrary  component  C  of  minimal 
weight  and  an  arbitrary  root  a  of  C. 

Step  2:  Find  the  smallest  integer  q  for  which  there  is 
an  edge  a  b  in  G  such  that  b  is  not  in  C.  This 
step  fails  if  no  such  edge  exists. 

Step  3:  Place  the  edge  a  ==>  b  into  T. 

Let  Ti  be  a  graph  that  is  constructed  after  i  applications 
of  the  above  procedure,  where  To  is  an  initial  graph  as 
defined  above.  Clearly,  any  such  sequence  {To,Ti,...} 
is  finite  and  the  last  element  is  terminal  w.r.t.  G. 

We  prove  by  induction  on  i,  the  number  of  applica¬ 
tions  of  the  procedure,  that  for  any  graph  7),  all  of  the 
components  of  Ti  are  rooted,  and  for  any  rooted  com¬ 
ponent  C  it  is  the  case  that  k  >  min(u),W)or,  k  >  2 
and  w  >  2,  where  k  is  the  size  of  C  and  w  is  its  weight. 
This  together  with  Lemma  5.4  and  the  fact  that  W  <  U 
completes  the  proof. 

Let  (3  =  1/a.  As  discussed  before,  each  component  C 
of  To  has  a  root  and  has  size  k  at  least  2.  The  component 
C  consists  of  exactly  k  edges  with  label  1,  so  its  weight 
is  also  k.  Hence,  the  base  case  holds  since  (3  >  \. 

Since  To  is  a  subgraph  of  7}  which  also  includes  all 
nodes  of  Ti,  it  follows  that  the  size  and  weight  of  any 


rooted  component  of  7)  are  both  at  least  2.  Now,  sup¬ 
pose  the  procedure  adds  an  edge  of  label  q  from  compo¬ 
nent  C\  of  size  ifc]  and  weight  w\  to  component  Ci  of  size 
*2  and  weight  w^.  By  step  1,  the  new  edge  emanates 
from  a  minimal  weight  component,  so  uij  <  w 2.  The 
weight  w  of  the  newly  formed  component  is  u>i  +  +  q, 

and  the  number  of  nodes  ifc  is  Jbj  4-  £2-  We  show  now 
that  k  >  min(w,  W)°. 

Clearly,  if  >  W  then  min(u>2,  W)a  =  min(ti>,  W)a 
and  ifc2  <  ifc,  so  by  the  induction  hypothesis  we  are  done. 
Hence,  we  assume  that  u>2  <  W,  so  also  tt>i  <  W.  Since 
wi  <  W  it  follows  from  Lemma  5.3  that  there  exist 
positive  integers  p'  and  q'  that  satisfy  the  zigzag  in¬ 
equalities  and  q'  <  (tei(n  —  t))/(t  +  2)  -I-  3;  hence  by 
Lemma  5.1  there  is  an  edge  of  label  q'  from  any  root  of 
C\  to  some  node  not  in  C\.  Thus,  by  the  minimality 
of  q  (the  weight  of  the  edge  in  step  2),  it  follows  that 
q  <  q'  which  implies  that  q  <  (u)](n  —  <))/(<  +  2)  +  3; 
hence, 

w  =  w\  +  w?  +  q  (8) 

<  +  2  U>1  +  u’2  +  ^  (9) 

<  (^+2.5)  Wi  +U-2.  (10) 


(rrl + 2  5)  + 

=  +<•;'’  <■» 

<  (*'i  +  k'2)0.  (12) 

It  is  not  difficult  to  see  that  since  (n  —  t)/(t  +  2) 4- 3.5  = 
2 0 ,  equality  holds  for  k j  =  k2.  As  k2  is  increased  to  be 
larger  than  ifcj,  the  right  side  increases  more  rapidly  than 
the  left  side  since  0  >  1;  hence,  the  inequality  holds. 
Finally,  by  the  induction  hypothesis,  /tj  >  u/i°  =  kl 
and  F2  >  =  *2-  Hence, 

(*; +k2)0  <(k]+k2)0  =  k0.  (13) 

Putting  equations  (8)— (13)  together  gives  w  <  k0,  so 
k  >  wa  >  min(u),  IV)°.  □ 

Theorem  5.1  follows  immediately  from  Lemma  5.5. 

6  Relation  to  Other  Problems 

In  this  section  we  show  that  there  are  efficient  reduc¬ 
tions  between  the  wakeup  problem  for  r  =  [n/2j  +  1 
and  the  consensus  and  leader  election  problems.  Hence, 
the  wakeup  problem  can  be  viewed  as  a  fundamental 


Let  k[  =  Wia,  and  k'2  —  wi°.  Then  k\  <  k2,  u»i 
and  u>2  =  k[f .  We  claim  that 
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problem  that  captures  the  inherent  difficulty  of  these 
two  problems.  The  following  Lemma  shows  that  in  or¬ 
der  to  decide  on  some  value  in  a  t-resilient  consensus 
i  protocol,  it  is  always  necessary  (and  in  some  cases  also 

sufficient)  to  learn  first  that  at  least  t  + 1  processes  have 
waked  up,  and  similarly  in  order  to  be  elected  in  a  t- 
{  resilient  leader  election  protocol,  it  is  always  necessary 

to  learn  that  at  least  t  +  1  processes  have  waked  up.  An 
immediate  consequence  of  the  lemma  is  that  there  is  no 
consensus  or  leader  election  protocol  that  can  tolerate 
fn/2"|  failures. 

Lemma  6.1:  (1)  Any  t-resilieni  consensus  (leader  elec- 
lion)  protocol  is  a  t-resilieni  wakeup  protocol  for  any 
r  <  t  +  1  and  p  =  n  —  t  (p=J);  (2)  For  any  i  <  n/3, 
there  exists  t-resilient  consensus  and  leader  election  pro¬ 
tocols  which  are  not  t-resilient  wakeup  protocols  for  any 
r  >  t  +  2. 

Theorem  6.1:  Any  protocol  that  solves  the  wakeup 
problem  for  any  t  <  n/2,  r  >  n/2  and  p  =  1,  using  a 
single  v-valued  shared  register,  can  be  transformed  into 
a  t-resilient  consensus  (leader  election)  protocol  which 
uses  a  single  8v-valued  (Av-valued)  shared  register. 

From  Theorems  6.1  and  4.4,  it  follows  that  for  any 
t  <  n/2,  there  is  a  t-resilient  consensus  (leader  elec¬ 
tion)  protocol  that  uses  an  0(<)-valued  shared  regis¬ 
ter.  Next  we  show  that  the  converse  of  Theorem  6.1 
also  holds.  That  is,  the  existence  of  a  t-resilient  con¬ 
sensus  or  leader  election  protocol  which  uses  a  single 
v-valued  shared  register  implies  the  existence  of  a  t- 
resilient  wakeup  protocol  for  any  r  <  |n/2j  +  1  which 
uses  a  single  O(v)- valued  shared  register. 

Theorem  6.2:  Any  t-resilient  protocol  that  solves  the 
consensus  or  leader  election  problem  using  a  single 
v-valued  shared  register  can  be  transformed  into  a  t- 
resilient  protocol  that  solves  the  wakeup  problem  for  any 
t  <  [n/2j  +  1  which  uses  a  single  Av-valued  shared  reg¬ 
ister. 

It  follows  from  Theorem  6.2  that  the  lower  bound 
we  proved  in  Section  5  for  the  wakeup  problem  when 
t  =  [n/2j  +  1  also  applies  to  the  consensus  and  leader 
election  problems.  Finally,  an  immediate  corollary  of 
Theorem  6.1  and  Theorem  6.2  is  that  the  consensus 
and  leader  election  problems  are  space-equivalent.  That 
is,  there  is  a  t-resilient  consensus  protocol  that  uses  an 
O(t)- valued  shared  register  iff  there  is  a  t-resilient  leader 
election  protocol  that  uses  an  C>(t)-valued  shared  regis¬ 
ter. 

7  Conclusions 

•  We  study  the  fundamental  new  wakeup  problem  in  a 

new  model  where  all  processes  are  programmed  alike, 
there  is  no  global  synchronization,  and  it  is  not  possi¬ 


ble  to  simultaneously  reset  all  parts  of  the  system  to  a 
known  initial  state. 

Our  results  are  interesting  for  several  reasons: 

•  They  give  a  quantitative  measure  of  the  cost  of 
fault-tolerance  in  shared  memory  parallel  machines 
in  terms  of  communication  bandwidth. 

•  They  apply  to  a  model  which  more  accurately  re¬ 
flects  reality. 

•  They  relate  recent  results  from  three  different  ac¬ 
tive  research  areas  in  parallel  and  distributed  com¬ 
puting,  namely: 

-  Results  in  shared  memory  systems  [2,  11,  19, 
31,  36,  38,  39,  46,  50,  51]" 

—  The  theory  of  knowledge  in  distributed  sys¬ 
tems  [6,  14,  15,  17,  18,  22,  28,  25,  29,  30,  26, 
33,  37,  40,  41], 

-  Self  stabilizing  protocols  [3,  4,  8,  9,  12,  24,  35, 
47], 

•  They  give  a  new  point  of  view  and  enable  a  deeper 
understanding  of  some  classical  problems  and  re¬ 
sults  in  cooperative  computing. 

•  They  are  proved  using  techniques  that  will  likely 
have  application  to  other  problems  in  distributed 
computing. 
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